Classification method and apparatus

ABSTRACT

Processing data derived from currency items includes measuring samples of currency items of classes to produce feature vectors in a first space and mapping the feature vectors to a second space in which there is a clearer separation of the classes.

[0001] The invention relates to a method and apparatus for classifyingitems. The invention is concerned especially with the classification ofcoins or banknotes.

[0002] Coins and banknotes inserted into mechanisms, such as vendingmachines, change machines and the like, are classified, on the one handaccording to value, and/or on the other hand, between originals andcopies or counterfeits thereof. Various methods of performing suchclassifications are known. As one example, described in GB 2 238 152 A,the contents of which are incorporated herein by reference. For example,measurements are taken from an inserted coin which represent differentfeatures of the coin, such as material and the thickness. Thosemeasurements are then compared with respective stored pairs of values,each set of pair of values corresponding to a respective acceptabledenomination of coin. When each measured value falls within therespective range for a given denomination, the inserted coin isclassified as belonging to that denomination.

[0003] In the type of classification discussed above, the measuredvalues can be regarded as elements in a feature vector, and theacceptable measurements for different denominations correspond toregions in feature space, known as acceptance regions. In the examplegiven above, the feature space is two-dimensional, and acceptanceregions are rectangles, but the feature space can have any number ofdimensions, with corresponding complexity in the acceptance regions. Forexample, GB 2 254 949 A, the contents of which are incorporated hereinby reference, describes ellipsoidal acceptance regions inthree-dimensional feature space.

[0004] Other examples of methods and apparatus for classifying bills andcoins are described in EP 0 067 898 A, EP 0 472 192 A, EP 0 165 734 A.Other methods of classification include the use of neural networks, asdescribed, for example, in EP 0 553 402 A and EP 0 671 040 A, thecontents of which are also incorporated herein by reference.

[0005] A significant problem in the classification of coins is thedifficulty of separating different denominations. The populationdistributions of the different denominations of interest may be suchthat it is not possible easily to define appropriate acceptanceboundaries with which adequately separate the denominations. Anotherproblem is that in order to achieve adequate separation, it may benecessary to consider feature vectors having a large number of elements,which makes it more difficult to understand the various distributionsand thus more difficult to obtain suitable acceptance boundaries. Theseproblems are akin to general classification problems in data analysiswhich has been studied and have led to various different techniquesincluding statistical methods.

[0006] As an example of a statistical method of data analysis, principalcomponent analysis (“PCA”), is a method whereby data expressed in onespace is transformed using a linear transformation into a new space,where most of the variation within the data can be explained using fewerdimensions than in the first space. The method of PCA involves findingthe eigenvectors and eigenvalues of the covariance matrix of thevariables. The eigenvectors are the axes in the new space, with theeigenvector having the highest eigenvalue being the first “principalcomponent” and so on in decreasing size. Details of PCA can be found intextbooks on multivariate analysis, such as “Introduction toMultivariate Analysis” by Chatfield and Collins, see Chapter 4.

[0007] Another method of data analysis for classification purposes islinear discriminant analysis (“LDA”). LDA is useful when it is knownthat the data falls into separate groups. LDA aims to transform the datainto a new space so as to maximize the distance between the centre ofeach group of data as projected onto axes in the new space and also tominimize the variance of each group along the axes. Methods for doingthis are described in, for example, “Introduction to Statistical PatternRecognition” by Fukunaga (“Fukunaga”). In one example, the maximisationis performed by finding a linear transformation which maximises thevalue of the trace of C⁻¹V where V is the inter-class covariance matrixand C is the covariance matrix of all samples. As explained in Fukunaga,this amounts to finding the eigenvectors and eigenvalues of C⁻¹V. Theeigenvectors are the axes of the new space. As described in the paper,when there are N classes, the new space has N−1 dimensions.

[0008] In many situations, neither PCA nor LDA will give adequateseparation of the groups of data. A further method of data analysis isnon-linear component analysis (NCA), which is based on PCA. In NCA, thedata is projected into a new space using a non-linear mapping, and thenPCA is performed in the new space. Details of NCA are given in thearticle “Nonlinear component Analysis as a Kernel Eigenvalue Problem” byBernhard Scholkopf, Alexander Smola and Klaus-Robert Muller, NeuralComputation 10, 1299-1319 (1998). (“Scholkopf”.)

[0009] A problem with NCA is that the dimension of the non-linear spacemay be very large, and so the number of principal components is alsovery large. For a given problem, it is not known how many principalcomponents are needed for a good classification.

[0010] Generally, the invention relates to a method of deriving aclassification for classifying items of currency comprising measuringknown samples for each class and deriving features vectors from themeasured samples, mapping the feature vectors to a second space in whichthere is a clearer separation of the different classes and deriving aseparating function using the separation in the second space.

[0011] More specifically, the present invention provides a method ofderiving a classifier for classifying items of currency into two or moreclasses comprising measuring known samples for each class and derivingfeature vectors from the measured samples, selecting a functioncorresponding to a mapping of the feature vector space to a secondspace, mapping feature vectors to image vectors, and derivingcoefficients representing N−1 axes, where N is the number of classes, inthe second space, obtaining values representing the projections of theimage vectors for the measured samples onto the N−1 axes, and usingthose values to derive a separating function for separating the classesequivalent to a separating function in the second space.

[0012] The invention also provides a method for classifying an item ofcurrency comprising measuring features of the item, generating a featurevector from the measured values, and classifying the item using aclassifying derived by a method according to any one of claims 1 to 6.

[0013] The invention also provides an apparatus for classifying items ofcurrency comprising measuring means for measuring features of an item ofcurrency, feature vector generating means for generating a featurevector from the measured values, and classifying means for classifyingthe item using a classifier derived according to the method of any oneof claims 1 to 6.

[0014] The invention also provides an apparatus for classifying items ofcurrency comprising measuring means for measuring features of an item ofcurrency, feature vector generating means for generating a featurevector from the measured values, and classifying means for classifyingthe item using a function corresponding to a non-linear mapping of thefeature vector space to a second higher-dimensional space, mappingfeature vectors to image vectors, and coefficients representative of N−1axes, where N is the number of classes that can be classified by theapparatus, in the second space, and a function equivalent to aseparating function in the second space.

[0015] An embodiment of the invention will be described with referenceto the accompanying drawings of which:

[0016]FIG. 1 is a block diagram of a classification system.

[0017]FIG. 2 is a graph showing a distribution of coin data; and

[0018]FIG. 3 is a graph showing a projection of the data of FIG. 2 ontonew axes.

[0019] The invention will be described with reference to a coinvalidator.

[0020] In FIG. 1, box 1 designates a measuring system which includes aninlet 2, a transport system in a form of a coin inlet and coin transportpath (not shown) for presenting a sample 3 and a sensor system (notshown) for measuring physical quantities of the sample. The measuringsystem 1 is connected to a processing system 4 by means of a data bus 5.Processing system 4 is connected to a classifier 6 by means of a databus 7. The output of the classifier 6 is connected to a utilizationsystem 8 by means of a data output bus 9. The utilization system 8 is inthis example a vending machine, but may also be, for example, a moneyexchange machine.

[0021] The measuring system 1 measures features of an inserted coin 3.The measured features are assembled into a feature vector having nelements, where each element corresponds to a measured feature by theprocessing system 4. In the present example, the sensor system measuresvalues representative of the material, thickness and diameter of aninserted coin, using known techniques (see, for example, GB 2 254 949 A)and those values are the three elements of the corresponding featurevector. Briefly, each sensor comprises one or more coils in aself-oscillating circuit. In the case of the diameter and thicknesssensors, a change in the inductance of each coil caused by the proximityof an inserted coin causes the frequency of the oscillator to alter,whereby a digital representation of the respective property of the coincan be derived. In the case of the conductivity sensor, a change in theQ of the coil caused by the proximity of an inserted coin causes thevoltage across the coil to alter, whereby a digital outputrepresentative of conductivity of the coin may be derived. Although thestructure, positioning and orientation of each coil, and the frequencyof the voltage applied thereto, are so arranged that the coil providesan output predominantly dependent upon a particular one of theproperties of conductivity, diameter and thickness, it will beappreciated that each measurement will be affected to some extent byother coin properties.

[0022] Of course, many different features representative of items ofcurrency can be measured and used as the elements of the featurevectors. For example, in the case of a banknote, the measured featurescan include, for example, the width of the note, the length of the note,and the intensity of reflected or transmitted light for the whole orpart of the note. As an example, a measuring system can be arranged toscan a banknote along N lines using optical sensors. Each scan linecontains L individual areas, which are scanned in succession. In eacharea, there are measurements of M different features. More specifically,for each area, measurements are made of the reflectance intensities ofred, green and infra-red radiation. The total number of measurements fora banknote is therefore L×M×N. These measurements form the components ofa feature vector for the respective specimen, so that the feature vectorhas L×M×N components. Alternatively, the measurements can be processedin a different way to obtain a feature vector representative of themeasured specimen. For example, local feature vectors for each measuredarea can be formed made up of the M measurements for that area, so thateach local feature vector has M components. The local feature vectorscan then be summed over the area of the banknote to obtain an Mdimensional feature vector representative of the entire specimen.

[0023] The feature vector is then input to the classifier 6. Theclassifier 6 determines whether the sample belongs to any one ofpredetermined classes, using the feature vector and predeterminedclassification criteria including a separating function. If the sampleis identified as belonging to an acceptable denomination of banknote,then it is accepted and the corresponding value of the note is credited.If the sample is identified as belonging to a known counterfeit group,it is rejected.

[0024] In this example, the system is for classifying two denominationsof coins and one known counterfeit. A two-dimensional representation ofthe distribution in measurement space is shown in FIG. 2. The crossesrepresent samples of the first denomination, the dots representcounterfeits of the first denomination and the circles represent samplesof the second denomination.

[0025] The derivation of the separating function will be described belowin general terms. The method of classification will then be described,also in general terms, followed by an explanation of the application ofthe general method to the specific example.

[0026] Briefly, a method for deriving a separating function according toan embodiment of the invention maps the input space, that is the spaceof the measured feature vectors, using a non-linear map, into a higherdimensional space with linear properties. Separating hyperplanes areconstructed in the mapped space using training data, using theequivalent of an LDA analysis in the mapped space.

[0027] The population distribution of the denominations are analysed asdiscussed below.

[0028] Initially, samples of each of the denominations of interest andeach of the known counterfeit are measured and corresponding featurevectors are formed. The feature vectors from the samples, when plotted,for example, on a n-dimensional scatter graph, (where n is the number ofmeasured features) fall roughly into groups, known as clusters. Thesemeasured samples are then used to derive a separating function, asdescribed below. In this example, 50 samples for each denomination and50 samples of the counterfeit, are used.

[0029] Before proceeding further, a general explanation of the notationused is provided.

[0030] The input space, that is, the space of feature vectors, isdefined as X. ${X = {\bigcup\limits_{l = 1}^{N}X_{l}}}\quad,$

[0031] , where N is the number of clusters. The cardinality of subspaceX₁ is denoted by n₁, and the number of elements in X is M. Thus${\sum\limits_{l = 1}^{N}n_{l}} = {M\quad.}$

[0032] x^(t) is the transpose of vector x.

[0033] In the input space, C is the covariance matrix, and$\begin{matrix}{C = {\frac{1}{M}{\sum\limits_{j = 1}^{M}{x_{j}x_{j}^{t}}}}} & (1)\end{matrix}$

[0034] The method of the invention uses a kernel function k defining adot product in a mapped space. Suppose φ is a non-linear functionmapping X into a Hilbert space F.

φ:X→F

x→φ(x)  (2)

[0035] and k(x,y)=φ(x)·φ(y)=φ^(t)(x)φ(y)

[0036] As will be clear from the following discussion, it is notnecessary explicitly to construct φ for a given k, although it can beshown, by Mercer's theorem, if for any k is a continuous kernel of apositive integral operator which is positive, then a φ exists (seeSchölkopf Section 3 and Appendix C). Nor is it necessary to perform dotproducts explicitly in F, which may be an infinite dimensional space.

[0037] In F, V is the covariance matrix, and $\begin{matrix}{V = {\frac{1}{M}{\sum\limits_{j = 1}^{M}{{\varphi \left( x_{j} \right)}{\varphi^{\prime}\left( x_{j} \right)}}}}} & (3)\end{matrix}$

[0038] We assume that the observations are centred in F, that is, that${\sum\limits_{j = 1}^{M}{\varphi \left( x_{j} \right)}} = 0.$

[0039] . A method of centering data will be described later.

[0040] B is the covariance matrix of the cluster centres, and$\begin{matrix}{B = {\frac{1}{M}{\sum\limits_{l = 1}^{N}{n_{l}\overset{\_}{\varphi_{l}\varphi_{l}^{\prime}}}}}} & (4)\end{matrix}$

[0041] where {overscore (φ₁)} is the mean value of the cluster 1, thatis $\begin{matrix}{\overset{\_}{\varphi_{l}} = {\frac{1}{n_{l}}{\sum\limits_{k = 1}^{n_{l}}{\varphi \left( x_{\alpha} \right)}}}} & (5)\end{matrix}$

[0042] where x_(1j) is the element j of the cluster 1.

[0043] B represents the inter-cluster inertia in F.

[0044] V can also be expressed using the clusters as $\begin{matrix}{V = {\frac{1}{M}{\sum\limits_{l = 1}^{N}{\sum\limits_{k = 1}^{n_{l}}{{\varphi \left( x_{lk} \right)}{\varphi^{\prime}\left( x_{n} \right)}}}}}} & (6)\end{matrix}$

[0045] V represents total inertia in F.

[0046] Let k_(ij)=k(x_(i), x_(j))

[0047] and (k_(ij))_(pq)=(φ^(t)(x_(pi))+(x_(qj)))

[0048] Let K be an (M×M) matrix defined on the cluster elements by$\left( K_{pq} \right)_{\underset{q = {1\quad \ldots \quad N}}{p = {1\quad \ldots \quad N}}}$

[0049] where (K_(pq)) is the covariance matrix between cluster _(p) andcluster _(q). $\begin{matrix}\begin{matrix}{K = \left( K_{pq} \right)_{\underset{q = {1\quad \ldots \quad N}}{p = {1\quad \ldots \quad N}}}} & {where} & {K_{pq} = \left( k_{ij} \right)_{\underset{j\quad \ldots \quad n_{q}}{i\quad \ldots \quad n_{p}}}}\end{matrix} & (8)\end{matrix}$

[0050] K_(pq) is a (n_(p)×n_(q)) matrix

[0051] and K is symmetric so that K_(pq)^(t) = K_(pq)

[0052] W is the matrix centre, and

W=(W₁)_(1=1 . . . N)  (9)

[0053] where W₁ is a (n₁×n₁) matrix with all terms equal to$\frac{1}{n_{l}}\quad.$

[0054] W is a M×M block diagonal matrix.

[0055] The method essentially performs linear discriminant analysis inthe mapped space F to maximise inter-cluster inertia and minimise theintra-cluster inertia. This is equivalent to eigenvalue resolution, asshown in Fukunaga. A suitable separating function can then be derived.

[0056] More specifically, the method involves finding the eigenvalues λand eigenvectors v that satisfy

λVv=Bv  (10)

[0057] The eigenvectors are linear combinations of elements of F and sothere exist coefficients α_(pq)(p=1 . . . N, q=1 . . . n_(p)) such that$\begin{matrix}{v = {\sum\limits_{p = 1}^{N}{\sum\limits_{q = 1}^{n_{p}}{\alpha_{pq}{\varphi \left( x_{pq} \right)}}}}} & (11)\end{matrix}$

[0058] The eigenvectors of equation (10) are the same as theeigenvectors of

λφ′(x _(ij))Vv=φ′(x _(ij))Bv  (12)

[0059] (see Schölkopf).

[0060] Using the definitions of K and W, and equations (6) and (11), theleft-hand side of (12) can be expressed as follows: $\begin{matrix}\begin{matrix}{{Vv} = {\frac{1}{M}{\sum\limits_{l = 1}^{N}{\sum\limits_{k = 1}^{n_{l}}{{\varphi \left( x_{lk} \right)}{\varphi^{\prime}\left( x_{lk} \right)}\quad {\sum\limits_{p = 1}^{N}{\sum\limits_{q = 1}^{n_{p}}{\alpha_{pq}{\varphi \left( x_{pq} \right)}}}}}}}}} \\{= {\frac{1}{M}{\sum\limits_{p = 1}^{N}{\sum\limits_{q - 1}^{n_{p}}{\alpha_{pq}{\sum\limits_{l = 1}^{N}{\sum\limits_{k = 1}^{n_{l}}{{\varphi \left( x_{lk} \right)}\begin{bmatrix}{\varphi^{\prime}\left( x_{lk} \right)} & {\varphi \left( x_{pq} \right)}\end{bmatrix}}}}}}}}}\end{matrix} \\{{and}\quad} \\\begin{matrix}{{\lambda \quad {\varphi^{\prime}\left( x_{ij} \right)}{Vv}} = {\frac{\lambda}{M}{\sum\limits_{p = 1}^{N}{\sum\limits_{q = 1}^{n_{p}}{\alpha_{pq}{\varphi^{\prime}\left( x_{ij} \right)}{\sum\limits_{l = 1}^{N}{\sum\limits_{k = 1}^{n_{j}}{{\varphi \left( x_{lk} \right)}\begin{bmatrix}{\varphi^{\prime}\left( x_{lk} \right)} & {\varphi \left( x_{pq} \right)}\end{bmatrix}}}}}}}}} \\{= {\frac{\lambda}{M}{\sum\limits_{p = 1}^{N}{\sum\limits_{q = 1}^{n_{p}}{\alpha_{pq}{\sum\limits_{l = 1}^{N}{\sum\limits_{k = 1}^{n_{j}}{\left\lbrack {{\varphi^{\prime}\left( x_{ij} \right)}\quad {\varphi \left( x_{lk} \right)}} \right\rbrack\begin{bmatrix}{\varphi^{\prime}\left( x_{lk} \right)} & {\varphi \left( x_{pq} \right)}\end{bmatrix}}}}}}}}}\end{matrix}\end{matrix}$

[0061] Using this formulate for all clusters i and for all elements j weobtain:${{\lambda \left( {{\varphi^{\prime}\left( x_{11} \right)},\ldots \quad,{\varphi^{\prime}\left( x_{\ln_{q}} \right)},\ldots \quad,{\varphi^{\prime}\left( x_{ij} \right)},\ldots \quad,{\varphi^{\prime}\left( x_{N1} \right)},...\quad,{\varphi^{\prime}\left( x_{{Nn}_{S}} \right)}} \right)}{Vv}} = {\frac{\lambda}{M}K\quad K\quad \alpha}$

[0062] where $\begin{matrix}{\alpha = \left( \alpha_{pq} \right)_{\underset{q = {1\quad \ldots \quad n_{p}}}{p = {1\quad \ldots \quad N}}}} \\{= \left( \alpha_{p} \right)_{p = {1\quad \ldots \quad N}}}\end{matrix}$

[0063] where α_(p)=(_(α) _(pq) )q=1 . . . _(n) _(p)

[0064] Using equations (4), (5) and (11), for the right term of (14):$\begin{matrix}\begin{matrix}{{Bv} = {\frac{1}{M}{\sum\limits_{p = 1}^{N}{\sum\limits_{q = 1}^{n_{p}}{\alpha_{pq}{\varphi \left( x_{pq} \right)}{\sum\limits_{l = 1}^{N}{{n_{l}\left\lbrack {\frac{1}{n_{l}}{\sum\limits_{k = 1}^{n_{l}}{\varphi \left( x_{lk} \right)}}} \right\rbrack}\left\lbrack {\frac{1}{n_{l}}{\sum\limits_{k = 1}^{n_{l}}{\varphi \left( x_{lk} \right)}}} \right\rbrack}^{\prime}}}}}}} \\{= {\frac{1}{M}{\sum\limits_{p = 1}^{N}{\sum\limits_{q = 1}^{n_{p}}{\alpha_{pq}{\sum\limits_{l = 1}^{N}{{\left\lbrack {\sum\limits_{k = 1}^{n_{l}}{\varphi \left( x_{lk} \right)}} \right\rbrack\left\lbrack \frac{1}{n_{l}} \right\rbrack}\left\lbrack {\sum\limits_{k = 1}^{n_{l}}{{\varphi^{\prime}\left( x_{lk} \right)}\quad {\varphi \left( x_{pq} \right)}}} \right\rbrack}}}}}}}\end{matrix} \\{and} \\{{{\varphi^{\prime}\left( x_{ij} \right)}{Bv}} = {\frac{1}{M}{\sum\limits_{p = 1}^{N}{\sum\limits_{q = 1}^{n_{p}}{\alpha_{pq}{\sum\limits_{l = 1}^{N}{{\left\lbrack {\sum\limits_{k = 1}^{n_{l}}{{\varphi^{\prime}\left( x_{ij} \right)}{\varphi \left( x_{lk} \right)}}} \right\rbrack\left\lbrack \frac{1}{n_{l}} \right\rbrack}\left\lbrack {\sum\limits_{k = 1}^{n_{l}}{{\varphi^{\prime}\left( x_{lk} \right)}\quad {\varphi \left( x_{pq} \right)}}} \right\rbrack}}}}}}}\end{matrix}$

[0065] For all clusters i and for all elements j we obtain:$\begin{matrix}{{\left( {{\varphi^{\prime}\left( x_{11} \right)},\ldots \quad,{\varphi^{\prime}\left( x_{\ln_{i}} \right)},\ldots \quad,{\varphi^{\prime}\left( x_{ij} \right)},\ldots \quad,{\varphi^{\prime}\left( x_{N1} \right)},...\quad,{\varphi^{\prime}\left( x_{{Nn}_{S}} \right)}} \right){Bv}} = {\frac{1}{M}K\quad W\quad K\quad \alpha}} & (14)\end{matrix}$

[0066] Combining (13) and (14) we obtain:

[0067] λKKα=KWKα

[0068] Thus, $\begin{matrix}{\lambda = \frac{\alpha^{\prime}{KWK}\quad \alpha}{\alpha^{\prime}{KK}\quad \alpha}} & (15)\end{matrix}$

[0069] K can be decomposed as K=QR (Wilkinson, 1971) so that Kα=QRα.

[0070] R is upper triangular and Q is orthonormal, that is Q^(t)Q=I.

[0071] Q is a M×r matrix and R is a r×M matrix, where r is the rank ofK. It is known that the QR decomposition always exists for a generalrectangular matrix.

[0072] Then, let

Rα=β  (16)

[0073] As the rows of R are linearly independent, for a given β, thereexists at least one α solution.

[0074] Hence Kα=Qβ and α^(t)K=β^(t)Q^(t) (K is symmetric).

[0075] Substituting in (15) $\begin{matrix}{\lambda = \frac{\alpha^{\prime}{KWK}\quad \alpha}{\alpha^{\prime}{KK}\quad \alpha}} & (17)\end{matrix}$

[0076] Q is orthonormal so

λβ=Q^(t)WQβ  (18)

[0077] Equation (18) is in the form of a standard eigenvector equation.As K is singular, the QR decomposition permits work on a subspace Qβ,which simplifies the resolution.

[0078] Then the coefficients α can be derived from β from equation (16),and then the eigenvectors from equation (11).

[0079] These coefficients α are normalised by requiring that thecorresponding vectors v in F be normalised. That is:

v^(t)v=1  (19)

[0080] or (from equation 11) $\begin{matrix}\begin{matrix}{{V^{\prime}V} = {{\sum\limits_{p = 1}^{N}\quad {\sum\limits_{q = 1}^{n_{p}}\quad {\sum\limits_{l = 1}^{N}\quad {\sum\limits_{h = 1}^{ni}\quad {\alpha_{pq}\alpha_{lh}{\varphi^{\prime}\left( x_{pq} \right)}{\varphi \left( x_{lh} \right)}}}}}} = 1}} \\{= {{\sum\limits_{p = 1}^{N}\quad {\sum\limits_{l = 1}^{N}\quad {\alpha_{p}^{\prime}K_{pl}\alpha_{l}}}} = 1}} \\{= {\alpha^{\prime}K\quad \alpha}} \\{\left. {{so}\quad (19)}\Rightarrow{\alpha^{\prime}K\quad \alpha} \right. = 1}\end{matrix} & (20)\end{matrix}$

[0081] The steps given above set out how to find the eigenvectors v ofequation (10).

[0082] As its known from linear discriminant analysis (see, for example,Fukunaga), the number of eigenvectors=N−1 where N is the number ofclusters. The image of the clusters in the subspace spanned by theeigenvectors is found by projecting onto the eigenvectors. This is doneusing the following equation:

[0083] for an eigenvector v, and a feature vector x. $\begin{matrix}\begin{matrix}{\left( {{\varphi^{\prime}(x)}v} \right) = {\sum\limits_{p = 1}^{N}\quad {\sum\limits_{q = 1}^{n_{p}}\quad {\alpha_{pq}{\varphi^{\prime}\left( x_{pq} \right)}{\varphi (x)}}}}} \\{= {\sum\limits_{p = 1}^{N}\quad {\sum\limits_{q = 1}^{n_{p}}\quad {\alpha_{pq}{k\left( {x_{{pq},}x} \right)}}}}}\end{matrix} & (21)\end{matrix}$

[0084] As can be seen from the above, the calculation does not requireknowledge of φ, or the need to calculate a dot product in F.

[0085] It has been shown in experiments that by use of a suitable kernelfunction, the images of the clusters in the eigenvector subspace arewell-separated and, more specifically, may be linearly separable, thatis they can be separated by lines, planes or hyperplanes.

[0086] Then a suitable separating function can easily be derived forclassifying measured articles, using a known technique, such asinspection, averaging, Malalanobis distance, comparison with k nearestneighbours.

[0087] As mentioned previously, it was assumed that the observations arecentred in F. Centering will now be discussed in more detail. Firstly,for a given observation x_(ij): element j of the cluster i, the imageφ(x_(ij)) is centered according to: $\begin{matrix}{{\overset{\sim}{\varphi}\left( x_{ij} \right)} = {{\varphi \left( x_{ij} \right)} - {\frac{1}{M}{\sum\limits_{l = 1}^{N}\quad {\sum\limits_{k = 1}^{n_{l}}\quad {\varphi \left( x_{lk} \right)}}}}}} & (22)\end{matrix}$

[0088] We have then to define the covariance matrix K with centeredpoints:

[0089] ({overscore (k)}_(ij))_(pq)=({overscore (φ)}(x_(pi)).{overscore(φ)}(x_(qj))) for a given cluster p and q. $\begin{matrix}{\left( {\overset{\sim}{k}}_{ij} \right)_{pq} = {\left\lbrack {{\varphi \left( x_{pi} \right)} - {\frac{1}{M}{\sum\limits_{l = 1}^{N}\quad {\sum\limits_{k = 1}^{n_{l}}\quad {\varphi \left( x_{lk} \right)}}}}} \right\rbrack \left\lbrack {{\varphi \left( x_{qj} \right)} - {\frac{1}{M}{\sum\limits_{h = 1}^{N}\quad {\sum\limits_{m = 1}^{n_{m}}\quad {\varphi \left( x_{hm} \right)}}}}} \right\rbrack}} \\{\left( {\overset{\sim}{k}}_{ij} \right)_{pq} = {\left( {k}_{ij} \right)_{pq} - {\frac{1}{M}{\sum\limits_{l = 1}^{N}\quad {\sum\limits_{k = 1}^{n_{l}}\quad {\left( 1_{ik} \right)_{pl}\left( k_{kj} \right)_{lq}}}}} - {\frac{1}{M}{\sum\limits_{h = 1}^{N}\quad {\sum\limits_{m = 1}^{n_{m}}\quad {\left( k_{im} \right)_{ph}\left( 1_{mj} \right)_{hq}}}}} + {\frac{1}{M^{2}}{\sum\limits_{l = 1}^{N}\quad {\sum\limits_{k = 1}^{n_{l}}\quad {\sum\limits_{h = 1}^{N}\quad {\sum\limits_{m = 1}^{n_{m}}\quad {\left( 1_{ik} \right)_{pl}\left( k_{km} \right)_{lk}\left( 1_{mj} \right)_{hq}}}}}}}}} \\{{\overset{\sim}{K}}_{pq} = {K_{pq} - {\frac{1}{M}{\sum\limits_{l = 1}^{N}\quad {1_{pl}K_{lq}}}} - {\sum\limits_{h = 1}^{N}\quad {K_{ph}1_{hq}}} + {\frac{1}{M^{2}}{\sum\limits_{l = 1}^{N}\quad {\sum\limits_{h = 1}^{N}\quad {1_{pl}K_{lh}1_{hq}}}}}}} \\{\overset{\sim}{K} = {K - {\frac{1}{M}1_{N}K} - {\frac{1}{M}{K1}_{N}} + {\frac{1}{M^{2}}1_{N}{K1}_{N}}}}\end{matrix}$

[0090] Where we have introduced the following matrix:

[0091] l_(pf)=(l_(ik))_(i=1, . . . ,n) _(p) _(;k=1, . . . ,n) _(l) _(,)(n_(p)×n_(l)) matrix whose elements are all equal to 1.

[0092] l_(N)=(l_(pl))_(p=1, . . . ,N;l=1, . . . ,N,) (M×M) matrix whoseelements are block matrices.

[0093] Thus, for non-centred points φ(x_(ij)), we can derive {overscore(K)} from K and then solve for the eigenvectors of {tilde over (K)}.Then, for a feature vector x, the projection of the centred φ-image of xonto the eigenvectors {tilde over (v)} is given by:$\left( {{{\overset{\sim}{\varphi}}^{\prime}(x)}v} \right) = {\sum\limits_{p = 1}^{N}\quad {\sum\limits_{q = 1}^{n_{p}}\quad {{\overset{\sim}{\alpha}}_{pq}{{\overset{\sim}{\varphi}}^{\prime}\left( x_{pq} \right)}{\overset{\sim}{\varphi}(x)}}}}$

[0094] The above discussion sets out in general terms the method ofgeneral discriminant analysis. The general principles will now beillustrated with reference to the specific example of the coinvalidator.

[0095] Returning to the example of the coin validator at the beginningof the description, the feature vectors each have three elements, andthere are three clusters, corresponding to each of the two denominationsof interest and the known counterfeit respectively.

[0096] 50 samples of each denomination and 50 samples of the counterfeitare input to the measuring system 1. As previously mentioned, the sensorsystems measures samples to obtain values representative of thethickness, material and diameter in each case. Corresponding featurevectors are formed from the measured features for each sample.

[0097] From the 50 samples feature vectors for each cluster, 37 arerandomly selected for use in generating the separating function.

[0098] A kernel function is then chosen. The kernel function is chosenon the basis of trial and error so as to choose whichever function givesthe best separation results. There are a large number of kernelfunctions, satisfying Mercer's theorem, which may be suitable. Examplesof kernel functions are the polynominal kernel:

[0099] k(x, y)=(x.y)^(d);

[0100] the Gaussian kernel:${{k\left( {x,y} \right)} = {\exp \frac{\left( {{x - y}}^{2} \right)}{\sigma^{2}}}};$

[0101] the hyberbolic tangent kernel:

[0102] k(x,y)=tan h((x.y)+θ); and

[0103] the sigmoid kernel:${k\left( {x,y} \right)} = {\left( \frac{1}{1 + ^{- {({{({x,y})} + \theta})}}} \right).}$

[0104] In this example, the Gaussian kernel is used, with a σ²=0.01.

[0105] Using the selected samples and the kernel function, the matricesK and W are calculated. (Equations (8) and (9)).

[0106] Then K is decomposed using QR decomposition.

[0107] Then eigenvectors β and corresponding eigenvectors are calculated(equation (18)).

[0108] Then coefficients α are calculated and normalised (equations (16)and (20)).

[0109] Thereafter, the feature vectors of the remaining 13 samples foreach cluster are projected onto the eigenvectors v (equation 21) and theresults are plotted on a graph for easy inspection. In this example,there are 3 clusters, so there are 2 eigenvectors, and separation is in2-d space. This is shown in FIG. 3. As can be seen, the clusters arewell-separated. More specifically, each cluster is projected on onepoint, which is the gravity centre. The separation of the projection ofthe clusters with the eigenvectors is then analysed, and used to derivea separation function. In this example, a linear separating function caneasily be derived by inspection. For example, a suitable separatingfunction is:

[0110] for eigenvectors v₁, v₂

[0111] and an input vector x

[0112] If [(φ′(x)v₁)>0 and (φ′(x)v₂)>0] then

[0113] x belongs to group 1 (that is, it is of the first denomination);

[0114] if [(φ′(x)v₁)>0 and (φ′(x)v₂)<0]

[0115] then x belongs to group 2 (that is, it is of the seconddenomination); and

[0116] if [(φ′(x)v₁)<0 and (φ′(x)v₂)>0]

[0117] then x belongs to group 3 (that is, it is a counterfeit of thefirst denomination).

[0118] Classification for coins of an unknown denomination is thenperformed as follows. The inserted coin is sensed, and measurementsrepresentative of the material, thickness and diameter are obtained, asfor the samples. A feature vector is then derived from the measuredvalues. The feature vector is then projected onto the calculatedeigenvectors (using equation 21) and the coin is classified inaccordance with the projection values and the separating function, asdescribed above.

[0119] The analysis of the sample values for the initial data analysisand the derivation of the separating function can be done, for example,using a microprocessor. Similarly, the classifier 6 may be amicroprocessor.

[0120] As an alternative, the classifier 6 may be a neural network, suchas a probabilistic neural network, or a perceptron. For example, theneural network may include N−1 linear output neurones and M hiddenneurones, where every kernel computation is a hidden neurone. Then theinput weights are the values x_(pq), and the coefficients α are theweights between the hidden neurones and the output layer.

[0121] Also, the classifier may be a linear classifier, or a SupportVector Machine.

[0122] The methods of the embodiment described above are equallyapplicable to a banknote or indeed to a classification of other sorts ofitems. Other methods of solving (10), for example by decomposing K usingeigenvector decomposition, are possible.

[0123] In the embodiment, a non-linear mapping to a higher-dimensionalspace is used. A linear mapping could be used instead. Also, mappingcould be to a lower-dimensional space, or to a space of the samedimension as the feature vector space.

1. A method of deriving a classification for classifying items ofcurrency into two or more classes comprising measuring known samples foreach class and deriving feature vectors from the measured samples,selecting a function corresponding to a mapping of the feature vectorspace to a second space, mapping feature vectors to image vectors, andderiving coefficients representing N−1 axes, where N is the number ofclasses, in the second space, obtaining values representing theprojections of the image vectors for the measured samples onto the N−1axes, and using those values to derive a separating function forseparating the classes equivalent to a separating function in the secondspace.
 2. A method as claimed in claim 1 wherein the mapping is anon-linear mapping.
 3. A method as claimed in claim 1 or claim 2 whereinthe second space is higher-dimensional than the first space.
 4. A methodas claimed in any one of claims 1 to 3 wherein the coefficients arederived by optimising the separation of the groups of image vectors foreach class with respect to the axes.
 5. A method as claimed in any oneof claims 1 to 4 comprising deriving a matrix V where V is thecovariance matrix in the second space and a matrix B where B is thecovariance matrix of the class centres in the second space, deriving thesolutions to the equation λVv=Bv, and deriving said coefficients fromthe solutions v.
 6. A method as claimed in any one of claims 1 to 5wherein said function expresses a dot product in the second space interms of a function on two elements of the feature vector space.
 7. Amethod as claimed in claim 6 wherein said function is k(x,y) wherek(x,y)=(x.y)^(d).
 8. A method as claimed in claim 6 wherein saidfunction is k(x,y) where${k\left( {x,y} \right)} = {\exp {\frac{\left( {{x - y}}^{2} \right)}{\sigma^{2}}.}}$


9. A method as claimed in claim 6 wherein said function is k(x,y) wherek(x,y)=tan h((x.y)+θ).
 10. A method as claimed in claim 6 wherein saidfunction is k(x,y) where${k\left( {x,y} \right)} = {\left( \frac{1}{1 + ^{- {({{({x,y})} + \theta})}}} \right).}$


11. A method for classifying an item of currency comprising measuringfeatures of the item, generating a feature vector from the measuredvalues, and classifying the item using a classifying derived by a methodaccording to any one of claims 1 to
 10. 12. An apparatus for classifyingitems of currency comprising measuring means for measuring features ofan item of currency, feature vector generating means for generating afeature vector from the measured values, and classifying means forclassifying the item using a classifier derived according to the methodof any one of claims 1 to
 10. 13. An apparatus for classifying items ofcurrency comprising measuring means for measuring features of an item ofcurrency, feature vector generating means for generating a featurevector from the measured values, and classifying means for classifyingthe item using a function corresponding to a mapping of the featurevector space to a second space, mapping feature vectors to imagevectors, and coefficients representative of N−1 axes, where N is thenumber of classes that can be classified by the apparatus, in the secondspace, and a function equivalent to a separating function in the secondspace.
 14. An apparatus as claimed in claim 13 wherein the classifyingmeans comprises means for deriving values representing the projection ofthe image of the feature vector of the measured item onto the or eachaxis.
 15. An apparatus as claimed as claimed in any one of claims 12 to14 wherein the classifying means comprises a neural network.
 16. Anapparatus as claimed in any one of claims 12 to 15 comprising a coininlet and the measuring means comprises sensor means for sensing a coin.17. An apparatus as claimed in claim 16 wherein the sensor means is forsensing the material and/or the thickness and/or the diameter of a coin.18. An apparatus as claimed in any one of claims 12 to 14 comprising abanknote inlet and wherein the measuring means comprises sensor meansfor sensing a banknote.
 19. An apparatus as claimed in claim 18 whereinthe sensor means is for sensing the intensity of light reflected fromand/or transmitted through a banknote.
 20. A coin validator comprisingan apparatus as claimed in any one of claims 12 to
 17. 21. A banknotevalidator comprising an apparatus as claimed in any one of claims 12 to14 or 18 or 19.